Accurate quantum state estimation via "Keeping the experimentalist honest" 
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In this article, we derive a unique procedure for quantum state estimation from a simple, self- 
evident principle: an experimentalist's estimate of the quantum state generated by an apparatus 
should be constrained by honesty. A skeptical observer should subject the estimate to a test that 
guarantees that a self-interested experimentalist will report the true state as accurately as possible. 
We also find a non-asymptotic, operational interpretation of the quantum relative entropy function. 



Consider a source of quantum states such as a laser, 
or an ion trap with a preparation procedure. Quantum 
state estimation is the problem of deducing what state 
it emits by analyzing the outcomes of measurements on 
many instances. The usual procedure for state estimation 
is quantum state tomography 0, , together with some 
variant of maximum-likelihood estimation 
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to ensure 

positivity. The obvious goal is an estimate "close" to the 
true state. Different metrics, such as fidelity relative 
entropy .&}, trace norm, or Hilbert-Schmidt norm 0, will 
favor different estimation procedures. 

Here, we derive an optimal state-estimation procedure 
by first identifying quantum relative entropy as a unique 
metric for characterizing an estimate's "goodness" . Our 
procedure is broadly adaptable to (1) arbitrary prior 
knowledge (or ignorance) and (2) arbitrary measurement 
procedures. 

Keeping the Experimentalist Honest: Implicit 
in the idea of state estimation is the assumption that 
some estimates are better than others. Suppose that a 
is an estimate of the true state p, and that f(p : a) is 
a measure of how "good" an estimate a is. We propose 
that this measure should obey three principles: 

1. The best estimate of p is p itself. If f(p : a) mea- 
sures how well a estimates p, then f(p : p) > f(p : 
a) for all o ^ p. 

2. f(p : a) should correspond to some operational 
test, as the payoff or cost of some experimental pro- 
cedure. 

3. The "reward" for correctly predicting an event 
should depend only on the predicted probability 
for that event. This is a version of the likelihood 
principle (see @). 

Remarkably, these simple assumptions single out one 
measure: the relative entropy between p and er, or 
S (p\\o~) = Tr(plnp — pine). It arises as the expected 
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payoff in a type of game between a cash-strapped exper- 
imentalist and her employer. 

Alice, an ambitious scientist attempting to build a 
quantum computer, produces states that she believes are 
described by the density operator p. She informs her em- 
ployer, Bob, that she has produced the state a. Bob, a 
conscientious scientific administrator, would like to en- 
sure that Alice does not lie - that a = p. He will period- 
ically visit Alice's lab and measure one of her states, in a 
way that may depend on her estimate a. Her future fund- 
ing will depend on the outcomes of these measurements. 
What measurement should Bob perform, and how should 
he pay Alice, so that she has no incentive to deceive him? 

We propose that Bob should measure in a basis 
{\fi)YLi tnat diagonalizes a = J2i Si\fi)(fi\- Upon get- 
ting outcome i, he should pay Alice Ri = C + D In Sj dol- 
lars, where C and D are non-negative constants. We de- 
note this as the "honest experimentalist reward scheme," 
or HERS. 

HERS motivates honesty: Bob's measurement 
yields outcome i with probability pi — Tr(p|/i)(/;|). Al- 
ice's expected reward is 

n n 
i=l i=l 

Rewriting the last term as YliPi ms « = Trp In cr yields 

R(p : a) = C + DTiplncr 

= C + D [Tip In p - (Trp In p - Tip In a)} 
= C-D[H(p) + S(p\\a)} (2) 

Since C, D, and p are fixed, Alice maximizes her expected 
reward by reporting a a that minimizes the relative en- 
tropy S(p\\a). This constrains a to be p itself 0- Alice 
is thereby motivated to be honest. She is also motivated 
to produce pure states - but not to lie about how pure 
the actual state is. 

HERS is unique: Unless Bob can do non-projective 
POVMs [3l|], this turns out to be the only verification 
procedure that satisfies our three criteria. 

In classical statistics, a reward scheme for a probabilis- 
tic forecast is a scoring rule. It assigns an average reward 
R(P : Q) to a forecast Q when events are distributed ac- 
cording to P. A reward that is uniquely maximized by an 
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honest forecast is a strictly proper scoring rule or SPSR 
(see review [Tojp. Given some P, the maximum reward 
under such a rule is P's wo/we, G(P) = P(P : P). Savage 
showed that for every SPSR, G(P) is strictly convex 111. 

To consider the quantum case, we observe that a mea- 
surement transforms a state p into a probability distribu- 
tion {Pi} over outcomes, to which we can apply a scoring 
rule. We represent a projective measurement of basis B 
as a quantum channel [3. For any state p, let /3[p] = P. P 
is a diagonal matrix of probabilities, and (3 simply anni- 
hilates off-diagonal elements. Let G{p) be the maximum 
of G(f3[p]) over all 8. Since the eigenvalues of p ma- 
jorize those of 3[p) |l2|. and G is convex, this maximum 
is achieved when /3[p] = p. Thus G(p) = G({Ai}), where 
{Xi} are the eigenvalues of p. 

Lemma 1. Given a physical state p and an estimate a, 
Bob can ensure Alice's honesty by applying a SPSR to the 
probabilities for a measurement of basis B if and only if 
a is diagonal in B. 

Proof: Represent Bob's cr-dependent measurement as 
a CP-map B a that annihilates off-diagonal elements in 
basis B. Let the SPSR yield a value G. 

1. Suppose that B diagonalizes c, so /3 CT [f] = o~. Then 
Alice's expected reward is 

R(f3 a [p] : PM) < GtfM) < G(p). (3) 

The inequalities are simultaneously saturated if and only 
if a = p, in which case Alice earns the full value G(p) of 
her state. When a ^ p, one or both of the inequalities is 
strict, so Alice earns strictly less than G(p). Thus, Alice 
maximizes her reward uniquely by reporting a = p. 

2. Suppose that there exists a a so that f3 a [a] ^ a. 
Then let p = /3 a [a\. Since 01 = (3 a [p\ = p = (3 a [<j\. 
Alice's expected reward is 

R{{3 rT \p)-PM) = R{p,p) = G{p). (4) 

whereas if she (truthfully) reports p, she can expect 

R(f3 p [p] : [3 p [p]) = G(PM) < G(p), (5) 

where the inequality holds because p's ei genv alues ma- 
jorize those of P p [p] (by Schur's theorem and be- 
cause G is convex. Alice expects the same reward for 
reporting p or a ^ p, so her honesty is not ensured. □ 

So far we have demanded only that our scoring rule 
be strictly proper. We now demand that Alice's reward 
depend only on her predicted probability for the observed 
event. In other words, RjJAsj}) = Ri(si). This reflects 
the Likelihood Principle 13]: all the relevant information 
in an event is contained in the likelihood of the hypothesis 
(here, p(i\o~)). How the experiment was performed is ir- 
relevant. In particular, this avoids any argument between 
Alice and Bob about how to describe the outcome [s] that 
did not occur. 

A remarkable theorem by Aczel (see also 0) then re- 
stricts the form of the reward function Ri(si). 



Theorem 1 (Aczel [14]). Let n > 3. The inequality 

n n 

^K^fe) ^ ^2PiRi(Pi) (6) 

i=l i=l 

is satisfied for all n-point probability distributions 
(pi . . .p n ) and (qi . . . q n ) if and only if there exist con- 
stants C\ . . . C n and D such that for all i G [1 . . . n], 

R i (p)=D\np + Ci. (7) 

In the scenario we consider, there is even less freedom. 
Aczel's theorem allows the constants Ct to depend on i. 
In the quantum setting, all the Ci must be equal to a 
fixed G independent of i (see proof in Appendix 0) . We 
have therefore proved the following: 

Theorem 2 (The honest experimentalist). Let A 

be a quantum system with dimension n > 3. Let p and 
a be density operators for A, and let {|<?i)} be an or- 
thonormal basis for A that depends only on a . Defining 
Pi = (9i\ P \9i) and St = (g l \ a \gf), suppose that 

^PiRiisi) <^PiRi(^i) (8) 

i i 

is satisfied for all p and a, with equality if and only if p = 
a. Then {\gi)} diagonalizes a, and there exist constants 
C and D such that Ri(si) = C + Plns^ for all i. 

The scheme outlined (HERS) is uniquely specified by 
Bob's need to guarantee Alice's honesty through self- 
interest. Relative entropy, S (p\\a), appears naturally as 
the amount of money that Alice can expect to lose by 
lying. Any "boss" who wishes never to be lied to must 
use the HERS payment scheme. 

Of course, rewards in the real world are often struc- 
tured less wisely. We propose, however, that a maximally 
ethical scientist should act as if she were being motivated 
by HERS, and use (p||cr) as the universal measure of 
honesty. Hereafter, we will assume that the experimen- 
talist is, in fact, honest. 

The Uncertain Experimentalist: What does 
"honesty" mean for an experimentalist who is not certain 
of p? We assert that she should behave as if she were 
guided by a tangible, strictly proper, reward scheme. 
HERS is an excellent candidate, but our proofs hold for 
any strictly proper scheme. 

Suppose that Alice does not know p, but knows that it 
will be selected from an ensemble 7r(p)dp (or simply 7r(p) 
hereafter, for clarity). Equivalently, she thinks the true 
state is p with probability 7r(p). Her expected reward 
(from HERS) for reporting a is: 

R = J R{p: cr)vr(p)dp 

- C-D^jH(p)n(p)dp + J S (p\\a)n(p)dp ) j 

— constcr + D J Tr(plner)7r(p)dp 

= const CT + D [Tr(plncr)] = const' CT - D [S (p||cr)] , 



3 



where p = J pir{p)dp. The "const CT " terms are indepen- 
dent of a and therefore out of Alice's control. Therefore, 
Alice maximizes her honesty by reporting the mean of 
her probability distribution. 

The uniqueness of HERS depends on the Likelihood 
Principle. However, the mean of the probability distri- 
bution is maximally honest for any strictly proper scoring 
rule: 

Theorem 3. Let Alice believe that p is selected from a 
distribution ir(p). Let her expected reward for reporting 
a be R{p : a) = J2iPi^i( a ) > where pi = TrEip, and 
R{p : p) > R{p : a) for all a =/= p. Alice maximizes her 
expected reward by reporting a = ~p = J pir(p)dp. 

Proof: Since Alice expects p to appear with probabil- 
ity 7r(p), her expected reward is: 



R = J R{p: <r)n(p)dp 



Y^Tr(E i p)R i (a)n(p)dp 

i 

= Y^TtEkfJ pn(p)dp^) Ri(a) 



R(p : a). 



(9) 
(10) 

(11) 
(12) 



R is strictly proper, so the unique maximum of R(jj : a) 
is at a = p. □ 
Consider, instead if Alice had tried to maximize fidelity 
|l5j . which is not derived from an operational procedure, 
but would guarantee Alice's honesty when she knows p 
exactly [3^]. Suppose that Alice knows that p is either 
0)(0| or |+)(+|, with equal probability. The fidelity be- 
tween any a and a pure state is F(a, \ip)(%j)\) = (V>| o~ \ip), 
so the average fidelity is just: 



F = Tr(ap). 



(13) 



where p = § (|0)(0| + |+)(+|). To maximize F, Alice 
would choose the largest eigenstate of p - not p itself. 
Thus, while fidelity appears at first like a good measure 
of honesty, it does not generally motivate Alice to report 
the mean of her distribution. Moreoever, it can motivate 
her to report a pure state that she knows is not the true 
state. 

This is not simply a different definition of honesty. An 
experimentalist who reports a pure state is predicting 
that some event will never occur. If Alice reports |0), she 
is asserting that no measurement will ever yield |1). In 
the presence of any uncertainty whatsoever, this is at 
best misleading, and at worst an outright lie. 

This illustrates that HERS strongly penalizes over- 
optimism. If Bob obtains an outcome for which 
(M a l/i) = 0, Alice will lose infinitely much money! A 
truly zero-probability event is one against which a gam- 
bler would bet infinite money, at arbitrarily bad odds. 
Reporting p — for an event that could conceivably hap- 
pen is infinitely misleading, and should be discouraged. 



The Informed Experimentalist: How should Al- 
ice use the results of measurements (that she has per- 
formed) to reduce her uncertainty? Suppose that she 
has performed POVM measurements on TV copies of p, 
where the ith result corresponds to a positive operator 
Ei. She knows two things: 

1 . p is selected at random from an ensemble described 
by 7r (p). 

2. Through experiments on copies of p, she 
has obtained a measurement record M = 
{Ei,E 2 ■ ■ ■ En}. 

Suppose that she reports cfj when Aij occurs, and 
is paid according to a SPSR where R(p : a) = 
EiTrd/iX/ilp)^^). 

Since p appears with probability 7To(p), the event " p 
appeared, Mj was measured, and o~j was reported" oc- 
curs with probability no(p)p (Mj\p). Alice's expected 
reward over all possible events is: 

3 

= E/ (^Tm)(f i \p)M^)jP(M J \p)7ro(p)dp 



1/iX/il / pp(M 3 \ p )MpW 



Ri(o-j). 



We rewrite this using 



Pj = J p(Mj\p)ir (p)dp, and 

p = — [ pp(Mj\p)ir a (p)dp, 
Pj J 



(14) 
(15) 



to get 



r = y^p.yv (\mi\pj) Ri{°j) = E/'.- /,,,: 6 : °>o- 



R is strictly proper, so by setting Uj — pj we uniquely 
maximize each term in the sum, and Eg uat ion 1151 defines 
the optimal estimate of p, given A4j. 

Equation^] is nothing other than Bayes' Rule. Thus, 
the mean of a Bayesian-inferred distribution over states 
is the unique optimal estimate - for any strictly proper 
reward scheme. We formalize this in the following theo- 
rem: 

Theorem 4. Lf p is drawn from an ensemble tto(p), and a 
measurement A4j with conditional probability p (A4j\p) is 
observed, then every strictly proper scoring rule R(p : a) 
is maximized by: 
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1. Using Bayes' Rule and Bom's Rule: 

p(m\ p )Mp) 



f dpir {p)p{M\ P ) 





uIiMe,p)_ 


To(p) 


J dp 


nli^(E lP )n (p) 



2. Reporting the mean of ttm (ft) ■ 

This applies not only to relative entropy, our preferred 
measure of honesty, but to any honesty-guaranteeing re- 
ward scheme. We conclude that Bayesian inference is the 
unique solution to honest state estimation. 

Other procedures will not optimize any measure of 
honesty derived from a strictly proper scoring rule. Alter- 
native measures of honesty will either (a) in some circum- 
stances, motivate an experimentalist to flat-out lie about 
the state, or (b) not be operationally implcmcntable (e.g., 
fidelity). Our previous discussion of fidelity illustrates 
that a non-operational metric that guarantees the hon- 
esty of a knowledgeable experimentalist can fail dramat- 
ically in the face of uncertainty. 

Information theorists have previously interpreted rel- 
ative entrop y a s a good measure of two states' distin- 
guishability |16lll7| - indeed, as the only meaningful one 
in the limit of many copies. We have invoked the Likeli- 
hood Principle rather than the many-copy limit, but we 
can easily allow Bob to jointly measure N copies of p. 
He must then apply a SPSR to the result, and Alice can 



expect a reward R(p® : a 



As iV — > oo, relative 



entropy remains meaningful, unlike other measures (e.g., 
the Brier score [To|'). 

In the presence of uncertainty, pure (or rank-deficient) 
states are infinitely dishonest estimates. Estimating a 
pure state means predicting that some event will never 
happen. Anyone taking such a prediction seriously would 
be justified in betting infinitely much money, at arbitrar- 
ily bad odds, against that event - and should therefore 
expect to lose infinitely much, if the estimate is incorrect. 
This should be discouraged. 

Bayesian state estimation has been discussed previ- 
ously El EES, especially for pure states [UE1 The 
predominance of other methods such as maximum likeli- 
hoood, in the current literature (e.g., jH El El 
EHEHE3], indicates that it has not received the attention 
it deserves. Our goal in this letter is to provide a concrete 
and compelling argument for Bayesian state estimation 
- and to call attention to the problematic implications of 
pure-state estimates. 
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APPENDIX A: FURTHER RESTRICTING THE 
REWARD FUNCTION 

We show that the quantum reward function is more 
tightly constrained than the classical one: the constants 
Ci of Theorem n must all be equal. We can assume that 
D = 1. Assuming a reward function of the form specified 
by Theorem^ the inequality R(p : a) < R(p : p) is then 
equivalent to 



S(fi\\a-)+^2Ci(ri-pi)>0. 



(Al) 



Let U(t) be a smooth curve in the unitary group defined 
on a neighborhood of t = and such that U(Q) — I. Also 
let a(t) = U{t)pW{t) and 

g(t) = S (p\\a(t)) + Y,C l h - T^pUme^e^it))] 



be the function defined by substituting o~(t) into the ex- 
pression of Eq. (|A1|) . (Here p — J2i r i\ e i)( e i\: so that, 
\fi) — U \ei).) Differentiating gives 



dt 



\a(t)) = -Tr[p(t7(lnp)[/t + U(lnp)U^)], (A2) 



where U = dU /dt and the dependence of U on t has 
been suppressed. Because U + U> — when t = 0, 
jfiS (p\\a(t)) \ t= o — 0. Likewise, p\ equals when t = 0. 
We must therefore determine the second derivative of 
g at t — and show that for a suitable choice of 
curve, this derivative is negative. In that case, g(t) = 
g(0) + 'g(0)t 2 /2 + O(t 3 ) with g(0) < 0, implying that g(t) 
is negative for sufficiently small t. 
So, differentiating again, we find 



k(t)) 



= 2Tr 



[X,p](lnp)X , (A3) 



where X is a Hermitian matrix such that U = iX. We 
have made use of the identity U + = —2UW that 
can be proved by differentiating UW = I. Because 
S (p\\a(t)) > 0, the expression in Eq. (|A3(I must also be 
nonnegative. 

Differentiating the second term of g(t), we find that 



dt 2 



2Tr 



t=o 



[X,p](lnp + B)X 



(A4) 



where B = J2i C l \e l )(e l \. 

We will now show that all the Cj must be equal. As- 
sume without loss of generality that C\ ^ C2. Let 
rj = exp(-(Cj+21n2)/2) for j = l,2andr 3 = l-n-r 2 . 
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With these choices, ri,V2 < 1/2, making p a density op- 
erator. Now write p and B for the restriction of p and 
B to the span of the first two eigenvectors of p. Observe 
that there exists a choice of X also with support only on 
this subspace such that Tr[[X, p](ln/5)X] > 0. 

With these choices, and noting that Tr[[X, p]X\ = 0, 



Because g(0) < contradicts the requirement that 
D(p\\a(t)) > 0, we conclude that Cj = C\ for alH. □ 



.9(0) 



2Tr [X,p](]np)X < 0. 



(A5) 
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